A Generalized Profile Syntax for Biomolecular Sequence Motifs and its Function in Automatic Sequence Interpretation

نویسندگان

  • Philipp Bucher
  • Amos Bairoch
چکیده

A general syntax for expressing biomolecular sequence motifs is described, which will be used in future releases of the PROSITE data bank and in a similar collection of nucleic acid sequence motifs currently under development. The central part of the syntax is a regular structure which can be viewed as a generalization of the profiles introduced by Gribskov and coworkers. Accessory features implement specific motif search strategies and provide information helpful for the interpretation of predicted matches. Two contrasting examples, representing E. coli promoters and SH3 domains respectively, are shown to demonstrate the versatility of the syntax, and its compatibility with diverse motif search methods. It is argued, that a comprehensive machine-readable motif collection based on the new syntax, in conjunction with a standard search program, can serve as a general-purpose sequence interpretation and function prediction tool.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Flexible Motif Search Technique Based on Generalized Profiles

A flexible motif search technique is presented which has two major components: (1) a generalized profile syntax serving as a motif definition language; and (2) a motif search method specifically adapted to the problem of finding multiple instances of a motif in the same sequence. The new profile structure, which is the core of the generalized profile syntax, combines the functions of a variety ...

متن کامل

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

Paleoenvironmental Reconstruction of Miocene Surma Succession in the Well Rashidpur # 04 of Bengal Basin Using Log Facies Interpretation

Detailed log facies studies of Miocene succession in the well Rashidpur-04, Rashidpur structure, Surma Basin were carried out by integrating wireline log and limited core sample data in order to reconstruct the paleoenvironments of deposition. Based on the analysis of the log motifs, grain size, sand/shale ratio and major change in gamma ray log motifs, two major depositional sequences were ide...

متن کامل

Constrained Seismic Sequence Stratigraphy of Asmari - Kajhdumi interval with well-log Data

Sequence stratigraphy is a key step in interpretation of the seismic reflection data. It was originally developed by seismic specialists, and then the usage of high-resolution well logs and core data was taken into consideration in its implementation. The current paper aims in performing sequence stratigraphy using three-dimensional seismic data, well logs (gamma ray, sonic, porosity, density, ...

متن کامل

Functional motifs in Escherichia coli NC101

Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings. International Conference on Intelligent Systems for Molecular Biology

دوره 2  شماره 

صفحات  -

تاریخ انتشار 1994